Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UUID comparisons to conform to IETF RFC 4122 #23847

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

ZacBlanco
Copy link
Contributor

@ZacBlanco ZacBlanco commented Oct 16, 2024

Description

Reverse UUIDs bytes internally to make comparison operators conform to IETF RFC 4122

Motivation and Context

The presto documentation states that we support UUIDs and conform to RFC 41221:

Before this change, UUIDs were read in as two longs in big endian format and used the that byte order for comparisons. However, the java bytewise comparison in our io.airlift.slice dependency assumes the backing values are in little endian format, so the bytes are swapped during the comparison2. This made comparisons operators between UUIDs incorrect according the RFC 41223 §3 under "Rules for Lexical Equivalence" on p.5.

UUIDs, as defined in this document, can also be ordered lexicographically.
For a pair of UUIDs, the first one follows the second if the most significant
field in which the UUIDs differ is greater for the first UUID. The second
follows the first if the most significant field in which the UUIDs differ
is greater for the second UUID.

Note that RFC 41223 has an errata in the paragraph describing the lexicographic comparison in which the original text is inconsistent. The corrected text can be found in the errata EID 14284 and is reproduced above for easier reference.

Additionally, RFC 9562 has been published which, supersedes 4122. It defines the sorting rules as a simple byte-wise comparison in §6.115

UUIDv6 and UUIDv7 are designed so that implementations that require sorting (e.g., database indexes) sort as opaque raw bytes without the need for parsing or introspection.

For example, before this change, the following comparison of UUIDs would result in a TRUE result:

presto:tpch> SELECT UUID '00000000-0000-0000-1000-000000000000' < UUID '00000000-0000-0000-0000-000000000001';
 _col0
-------
 true
(1 row)

This seems to be incorrect because the reading of the 00000000-0000-0000-1000-000000000000 UUID (say, UUID "A") appears to have a 1 byte in a more significant position than the 1 byte in the UUID 00000000-0000-0000-0000-000000000001 (say, UUID "B"). Because A has a 1 byte in a more significant position than B, this comparison should evaluate to FALSE.

In addition, when testing this same comparison in postgres (for which we also support the UUID type), postgres returns results which are inconsistent with Presto.

postgres=# SELECT uuid '00000000-0000-0000-1000-000000000000' < uuid '00000000-0000-0000-0000-000000000001';
 ?column?
----------
 f
(1 row)
Additional verification on ordering from postgres

SELECT * FROM (VALUES  
    (uuid '00000000-0000-0000-0000-000000000000'),
    (uuid '00000000-0000-0000-0000-000000000001'),
    (uuid '00000000-0000-0000-0000-000000000010'),
    (uuid '00000000-0000-0000-0000-000000000100'),
    (uuid '00000000-0000-0000-0000-000000001000'),
    (uuid '00000000-0000-0000-0000-000000010000'),
    (uuid '00000000-0000-0000-0000-000000100000'),
    (uuid '00000000-0000-0000-0000-000001000000'),
    (uuid '00000000-0000-0000-0000-000010000000'),
    (uuid '00000000-0000-0000-0000-000100000000'),
    (uuid '00000000-0000-0000-0000-001000000000'),
    (uuid '00000000-0000-0000-0000-010000000000'),
    (uuid '00000000-0000-0000-0000-100000000000'),
    (uuid '00000000-0000-0000-0001-000000000000'),
    (uuid '00000000-0000-0000-0010-000000000000'),
    (uuid '00000000-0000-0000-0100-000000000000'),
    (uuid '00000000-0000-0000-1000-000000000000'),
    (uuid '00000000-0000-0001-0000-000000000000'),
    (uuid '00000000-0000-0010-0000-000000000000'),
    (uuid '00000000-0000-0100-0000-000000000000'),
    (uuid '00000000-0000-1000-0000-000000000000'),
    (uuid '00000000-0001-0000-0000-000000000000'),
    (uuid '00000000-0010-0000-0000-000000000000'),
    (uuid '00000000-0100-0000-0000-000000000000'),
    (uuid '00000000-1000-0000-0000-000000000000'),
    (uuid '00000001-0000-0000-0000-000000000000'),
    (uuid '00000010-0000-0000-0000-000000000000'),
    (uuid '00000100-0000-0000-0000-000000000000'),
    (uuid '00001000-0000-0000-0000-000000000000'),
    (uuid '00010000-0000-0000-0000-000000000000'),
    (uuid '00100000-0000-0000-0000-000000000000'),
    (uuid '01000000-0000-0000-0000-000000000000'),
    (uuid '10000000-0000-0000-0000-000000000000')
) as t(u) order by u desc;

result

                 u
--------------------------------------
 10000000-0000-0000-0000-000000000000
 01000000-0000-0000-0000-000000000000
 00100000-0000-0000-0000-000000000000
 00010000-0000-0000-0000-000000000000
 00001000-0000-0000-0000-000000000000
 00000100-0000-0000-0000-000000000000
 00000010-0000-0000-0000-000000000000
 00000001-0000-0000-0000-000000000000
 00000000-1000-0000-0000-000000000000
 00000000-0100-0000-0000-000000000000
 00000000-0010-0000-0000-000000000000
 00000000-0001-0000-0000-000000000000
 00000000-0000-1000-0000-000000000000
 00000000-0000-0100-0000-000000000000
 00000000-0000-0010-0000-000000000000
 00000000-0000-0001-0000-000000000000
 00000000-0000-0000-1000-000000000000
 00000000-0000-0000-0100-000000000000
 00000000-0000-0000-0010-000000000000
 00000000-0000-0000-0001-000000000000
 00000000-0000-0000-0000-100000000000
 00000000-0000-0000-0000-010000000000
 00000000-0000-0000-0000-001000000000
 00000000-0000-0000-0000-000100000000
 00000000-0000-0000-0000-000010000000
 00000000-0000-0000-0000-000001000000
 00000000-0000-0000-0000-000000100000
 00000000-0000-0000-0000-000000010000
 00000000-0000-0000-0000-000000001000
 00000000-0000-0000-0000-000000000100
 00000000-0000-0000-0000-000000000010
 00000000-0000-0000-0000-000000000001
 00000000-0000-0000-0000-000000000000

Impact

  • Fixes comparisons to conform to the UUID RFC. Technically breaks compatibility between versions as comparisons now behave differently

Test Plan

  • Additional test cases for comparisons

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Improve UUID comparisons so they conform to `IETF RFC 4122 <https://datatracker.ietf.org/doc/html/rfc4122>`_. :pr:`23847`

Footnotes

  1. https://prestodb.io/docs/0.289/language/types.html#uuid-type

  2. https://github.com/airlift/slice/blob/8f0494bdaad91f0c57f03e09aad2d77f955cfe42/src/main/java/io/airlift/slice/Slice.java#L1331-L1340

  3. https://datatracker.ietf.org/doc/html/rfc4122#section-3 2

  4. https://www.rfc-editor.org/errata/eid1428

  5. https://datatracker.ietf.org/doc/html/rfc9562#section-6.11

@tdcmeehan tdcmeehan self-assigned this Oct 16, 2024
@ZacBlanco ZacBlanco force-pushed the upstream-uuid-big-endian branch from 347c794 to e66c3dc Compare October 16, 2024 19:26
@steveburnett
Copy link
Contributor

Thanks for the release note entry! A couple of nit suggestions to follow the phrasing in the Order of changes in the Release Notes Guidelines, add the PR number, and consider adding a link to the RFC. I found this link I used in the suggestion, if you have a better link please use it instead.

== RELEASE NOTES ==

General Changes
* Improve UUID comparisons so they are now performed correctly according to `IETF RFC 4122 <https://datatracker.ietf.org/doc/html/rfc4122>`_. :pr:`23847`

@ZacBlanco ZacBlanco force-pushed the upstream-uuid-big-endian branch from e66c3dc to 1c0504e Compare October 16, 2024 21:25
@ZacBlanco ZacBlanco force-pushed the upstream-uuid-big-endian branch from 1c0504e to ffbccc5 Compare October 16, 2024 21:59
@ZacBlanco
Copy link
Contributor Author

Thanks as usual for your suggestions @steveburnett 😄

@ZacBlanco ZacBlanco marked this pull request as ready for review October 17, 2024 05:27
@ZacBlanco ZacBlanco requested review from elharo and a team as code owners October 17, 2024 05:28
@ZacBlanco ZacBlanco requested a review from presto-oss October 17, 2024 05:28
@ZacBlanco ZacBlanco changed the title Use big-endian representation for UUIDs Fix UUID comparisons to conform to IETF RFC 4122 Oct 17, 2024
Copy link
Contributor

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks for the great writeup @ZacBlanco . I also tested these changes along with the native changes at facebookincubator/velox#10791 and the results match.

@BryanCutler
Copy link
Contributor

@ZacBlanco is it possible to better document the format of a Presto UUID, as it is stored in a Slice? With these changes, it will be 2 longs in little endian format, ordered by [MSB, LSB]. I would need to compensate for this in my PR for serializing UUID values from Presto Cpp at facebookincubator/velox#11197.

@ethanyzhang
Copy link
Contributor

@tdcmeehan can we merge?

@tdcmeehan tdcmeehan merged commit 47b25c1 into prestodb:master Nov 6, 2024
57 checks passed
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Nov 7, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
This ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
@ZacBlanco
Copy link
Contributor Author

@BryanCutler following up with a PR to add javadoc here: #23961

BryanCutler added a commit to BryanCutler/velox that referenced this pull request Nov 21, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Nov 21, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Dec 2, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Dec 3, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Dec 5, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Dec 10, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Dec 14, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Dec 14, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
BryanCutler added a commit to BryanCutler/velox that referenced this pull request Dec 17, 2024
This adds UUID comparison functions that were previously unsupported.
Functions added are <, >, <=, >=. Equal was already supported. The
ordering is done lexicographically and conforms to IETF [RFC 4122].
The ordering also matches Presto Java after #[23847].

[RFC 4122]: https://datatracker.ietf.org/doc/html/rfc4122.html
[23847]: prestodb/presto#23847
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants